Multi-View Co-Training of Transliteration Model

نویسندگان

Jin-Shea Kuo

Haizhou Li

چکیده

This paper discusses a new approach to training of transliteration model from unlabeled data for transliteration extraction. We start with an inquiry into the formulation of transliteration model by considering different transliteration strategies as a multi-view problem, where each view exploits a natural division of transliteration features, such as phonemebased, grapheme-based or hybrid features. Then we introduce a multi-view Cotraining algorithm, which leverages compatible and partially uncorrelated information across different views to effectively boost the model from unlabeled data. Applying this algorithm to transliteration extraction, the results show that it not only circumvents the need of data labeling, but also achieves performance close to that of supervised learning, where manual labeling is required for all training samples.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Bayesian model of bilingual segmentation for transliteration

In this paper we propose a novel Bayesian model for unsupervised bilingual character sequence segmentation of corpora for transliteration. The system is based on a Dirichlet process model trained using Bayesian inference through blocked Gibbs sampling implemented using an efficient forward filtering/backward sampling dynamic programming algorithm. The Bayesian approach is able to overcome the o...

متن کامل

English-Korean Named Entity Transliteration Using Substring Alignment and Re-ranking Methods

In this paper, we describe our approach to English-to-Korean transliteration task in NEWS 2012. Our system mainly consists of two components: an letter-to-phoneme alignment with m2m-aligner,and transliteration training model DirecTL-p. We construct different parameter settings to train several transliteration models. Then, we use two reranking methods to select the best transliteration among th...

متن کامل

Bayesian Co-Training

We propose a Bayesian undirected graphical model for co-training, or more generally for semi-supervised multi-view learning. This makes explicit the previously unstated assumptions of a large class of co-training type algorithms, and also clarifies the circumstances under which these assumptions fail. Building upon new insights from this model, we propose an improved method for co-training, whi...

متن کامل

Statistical models for unsupervised, semi-supervised and supervised transliteration mining

We present a generative model that efficiently mines transliteration pairs in a consistent fashion in three different settings, unsupervised, semi-supervised and supervised transliteration mining. The model interpolates two sub-models, one for the generation of transliteration pairs and one for the generation of non-transliteration pairs (i.e. noise). The model is trained on noisy unlabelled da...

متن کامل

Transliteration Mining with Phonetic Conflation and Iterative Training

This paper presents transliteration mining on the ACL 2010 NEWS workshop shared transliteration mining task data. Transliteration mining was done using a generative transliteration model applied on the source language and whose output was constrained on the words in the target language. A total of 30 runs were performed on 5 language pairs, with 6 runs for each language pair. In the presence of...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Multi-View Co-Training of Transliteration Model

نویسندگان

چکیده

منابع مشابه

A Bayesian model of bilingual segmentation for transliteration

English-Korean Named Entity Transliteration Using Substring Alignment and Re-ranking Methods

Bayesian Co-Training

Statistical models for unsupervised, semi-supervised and supervised transliteration mining

Transliteration Mining with Phonetic Conflation and Iterative Training

عنوان ژورنال:

اشتراک گذاری